Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 5791, 2024 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-38461342

RESUMO

Diabetic retinopathy (DR) is a serious ocular complication that can pose a serious risk to a patient's vision and overall health. Currently, the automatic grading of DR is mainly using deep learning techniques. However, the lesion information in DR images is complex, variable in shape and size, and randomly distributed in the images, which leads to some shortcomings of the current research methods, i.e., it is difficult to effectively extract the information of these various features, and it is difficult to establish the connection between the lesion information in different regions. To address these shortcomings, we design a multi-scale dynamic fusion (MSDF) module and combine it with graph convolution operations to propose a multi-scale dynamic graph convolutional network (MDGNet) in this paper. MDGNet firstly uses convolution kernels with different sizes to extract features with different shapes and sizes in the lesion regions, and then automatically learns the corresponding weights for feature fusion according to the contribution of different features to model grading. Finally, the graph convolution operation is used to link the lesion features in different regions. As a result, our proposed method can effectively combine local and global features, which is beneficial for the correct DR grading. We evaluate the effectiveness of method on two publicly available datasets, namely APTOS and DDR. Extensive experiments demonstrate that our proposed MDGNet achieves the best grading results on APTOS and DDR, and is more accurate and diverse for the extraction of lesion information.


Assuntos
Diabetes Mellitus , Retinopatia Diabética , Humanos , Retinopatia Diabética/diagnóstico por imagem , Olho , Algoritmos , Face , Projetos de Pesquisa
2.
Sensors (Basel) ; 23(13)2023 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-37448077

RESUMO

Although convolutional neural networks (CNNs) have produced great achievements in various fields, many scholars are still exploring better network models, since CNNs have an inherent limitation-that is, the remote modeling ability of convolutional kernels is limited. On the contrary, the transformer has been applied by many scholars to the field of vision, and although it has a strong global modeling capability, its close-range modeling capability is mediocre. While the foreground information to be segmented in medical images is usually clustered in a small interval in the image, the distance between different categories of foreground information is uncertain. Therefore, in order to obtain a perfect medical segmentation prediction graph, the network should not only have a strong learning ability for local details, but also have a certain distance modeling ability. To solve these problems, a remote feature exploration (RFE) module is proposed in this paper. The most important feature of this module is that remote elements can be used to assist in the generation of local features. In addition, in order to better verify the feasibility of the innovation in this paper, a new multi-organ segmentation dataset (MOD) was manually created. While both the MOD and Synapse datasets label eight categories of organs, there are some images in the Synapse dataset that label only a few categories of organs. The proposed method achieved 79.77% and 75.12% DSC on the Synapse and MOD datasets, respectively. Meanwhile, the HD95 (mm) scores were 21.75 on Synapse and 7.43 on the MOD dataset.


Assuntos
Algoritmos , Aprendizagem , Fontes de Energia Elétrica , Inteligência , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador
3.
Expert Syst Appl ; 228: 120389, 2023 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-37193247

RESUMO

Recent years have witnessed a growing interest in neural network-based medical image classification methods, which have demonstrated remarkable performance in this field. Typically, convolutional neural network (CNN) architectures have been commonly employed to extract local features. However, the transformer, a newly emerged architecture, has gained popularity due to its ability to explore the relevance of remote elements in an image through a self-attention mechanism. Despite this, it is crucial to establish not only local connectivity but also remote relationships between lesion features and capture the overall image structure to improve image classification accuracy. Therefore, to tackle the aforementioned issues, this paper proposes a network based on multilayer perceptrons (MLPs) that can learn the local features of medical images on the one hand and capture the overall feature information in both spatial and channel dimensions on the other hand, thus utilizing image features effectively. This paper has been extensively validated on COVID19-CT dataset and ISIC 2018 dataset, and the results show that the method in this paper is more competitive and has higher performance in medical image classification compared with existing methods. This shows that the use of MLP to capture image features and establish connections between lesions is expected to provide novel ideas for medical image classification tasks in the future.

4.
Sci Rep ; 13(1): 6342, 2023 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-37072483

RESUMO

Medical image segmentation provides various effective methods for accuracy and robustness of organ segmentation, lesion detection, and classification. Medical images have fixed structures, simple semantics, and diverse details, and thus fusing rich multi-scale features can augment segmentation accuracy. Given that the density of diseased tissue may be comparable to that of surrounding normal tissue, both global and local information are critical for segmentation results. Therefore, considering the importance of multi-scale, global, and local information, in this paper, we propose the dynamic hierarchical multi-scale fusion network with axial mlp (multilayer perceptron) (DHMF-MLP), which integrates the proposed hierarchical multi-scale fusion (HMSF) module. Specifically, HMSF not only reduces the loss of detail information by integrating the features of each stage of the encoder, but also has different receptive fields, thereby improving the segmentation results for small lesions and multi-lesion regions. In HMSF, we not only propose the adaptive attention mechanism (ASAM) to adaptively adjust the semantic conflicts arising during the fusion process but also introduce Axial-mlp to improve the global modeling capability of the network. Extensive experiments on public datasets confirm the excellent performance of our proposed DHMF-MLP. In particular, on the BUSI, ISIC 2018, and GlaS datasets, IoU reaches 70.65%, 83.46%, and 87.04%, respectively.


Assuntos
Redes Neurais de Computação , Semântica , Processamento de Imagem Assistida por Computador
5.
Sensors (Basel) ; 23(6)2023 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-36991777

RESUMO

At present, convolutional neural networks (CNNs) have been widely applied to the task of skin disease image segmentation due to the fact of their powerful information discrimination abilities and have achieved good results. However, it is difficult for CNNs to capture the connection between long-range contexts when extracting deep semantic features of lesion images, and the resulting semantic gap leads to the problem of segmentation blur in skin lesion image segmentation. In order to solve the above problems, we designed a hybrid encoder network based on transformer and fully connected neural network (MLP) architecture, and we call this approach HMT-Net. In the HMT-Net network, we use the attention mechanism of the CTrans module to learn the global relevance of the feature map to improve the network's ability to understand the overall foreground information of the lesion. On the other hand, we use the TokMLP module to effectively enhance the network's ability to learn the boundary features of lesion images. In the TokMLP module, the tokenized MLP axial displacement operation strengthens the connection between pixels to facilitate the extraction of local feature information by our network. In order to verify the superiority of our network in segmentation tasks, we conducted extensive experiments on the proposed HMT-Net network and several newly proposed Transformer and MLP networks on three public datasets (ISIC2018, ISBI2017, and ISBI2016) and obtained the following results. Our method achieves 82.39%, 75.53%, and 83.98% on the Dice index and 89.35%, 84.93%, and 91.33% on the IOU. Compared with the latest skin disease segmentation network, FAC-Net, our method improves the Dice index by 1.99%, 1.68%, and 1.6%, respectively. In addition, the IOU indicators have increased by 0.45%, 2.36%, and 1.13%, respectively. The experimental results show that our designed HMT-Net achieves state-of-the-art performance superior to other segmentation methods.


Assuntos
Fontes de Energia Elétrica , Dermatopatias , Humanos , Aprendizagem , Redes Neurais de Computação , Registros , Dermatopatias/diagnóstico por imagem , Processamento de Imagem Assistida por Computador
6.
Sci Rep ; 12(1): 20800, 2022 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-36460827

RESUMO

The existing typical combined query image retrieval methods adopt Euclidean distance as sample distance measurement method, and the model trained by triple loss function blindly pursues absolute distance between samples, resulting in unsatisfactory image retrieval performance. Meanwhile, these methods singularly adopt Convolutional Neural Network (CNN) to extract reference image features. However, receptive field of convolution operation has the characteristics of locality, which is easy to cause the loss of edge feature information of reference images. In view of shortcomings of these methods, the following improvements are proposed in this paper: (1) We propose Triangle Area Triple Loss Function (TATLF), which adopts Triangle Area (TA) as measurement of sample distance. TA comprehensively considers the absolute distance and included angle between samples, so that the trained model has better retrieval performance; (2) We combine CNN with Transformer to simultaneously extract local and edge features of reference images, which can effectively reduce the loss of reference images information. Specifically, CNN is adopted to extract local feature information of reference images. Transformer is used to pay attention to the edge feature information of reference images. Extensive experiments on two public datasets, Fashion200k and MIT-States, confirm the excellent performance of our proposed method.

7.
PLoS One ; 17(11): e0277578, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36409714

RESUMO

Skin lesion segmentation has become an essential recent direction in machine learning for medical applications. In a deep learning segmentation network, the convolutional neural network (CNN) uses convolution to capture local information for modeling. However, it ignores the relationship between pixels and still can not meet the precise segmentation requirements of some complex low contrast datasets. Transformer performs well in modeling global feature information, but their ability to extract fine-grained local feature patterns is weak. In this work, The dual coding fusion network architecture Transformer and CNN (TC-Net), as an architecture that can more accurately combine local feature information and global feature information, can improve the segmentation performance of skin images. The results of this work demonstrate that the combination of CNN and Transformer brings very significant improvement in global segmentation performance and allows outperformance as compared to the pure single network model. The experimental results and visual analysis of these three datasets quantitatively and qualitatively illustrate the robustness of TC-Net. Compared with Swin UNet, on the ISIC2018 dataset, it has increased by 2.46% in the dice index and about 4% in the JA index. On the ISBI2017 dataset, the dice and JA indices rose by about 4%.


Assuntos
Processamento de Imagem Assistida por Computador , Dermatopatias , Humanos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Redes Neurais de Computação , Dermatopatias/diagnóstico por imagem , Análise por Conglomerados
8.
Sensors (Basel) ; 22(18)2022 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-36146132

RESUMO

Doctors usually diagnose a disease by evaluating the pattern of abnormal blood vessels in the fundus. At present, the segmentation of fundus blood vessels based on deep learning has achieved great success, but it still faces the problems of low accuracy and capillary rupture. A good vessel segmentation method can guide the early diagnosis of eye diseases, so we propose a novel hybrid Transformer network (HT-Net) for fundus imaging analysis. HT-Net can improve the vessel segmentation quality by capturing detailed local information and implementing long-range information interactions, and it mainly consists of the following blocks. The feature fusion block (FFB) is embedded in the shallow levels, and FFB enriches the feature space. In addition, the feature refinement block (FRB) is added to the shallow position of the network, which solves the problem of vessel scale change by fusing multi-scale feature information to improve the accuracy of segmentation. Finally, HT-Net's bottom-level position can capture remote dependencies by combining the Transformer and CNN. We prove the performance of HT-Net on the DRIVE, CHASE_DB1, and STARE datasets. The experiment shows that FFB and FRB can effectively improve the quality of microvessel segmentation by extracting multi-scale information. Embedding efficient self-attention mechanisms in the network can effectively improve the vessel segmentation accuracy. The HT-Net exceeds most existing methods, indicating that it can perform the task of vessel segmentation competently.


Assuntos
Algoritmos , Vasos Retinianos , Fundo de Olho , Processamento de Imagem Assistida por Computador/métodos , Vasos Retinianos/diagnóstico por imagem
9.
Sensors (Basel) ; 22(18)2022 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-36146373

RESUMO

The model, Transformer, is known to rely on a self-attention mechanism to model distant dependencies, which focuses on modeling the dependencies of the global elements. However, its sensitivity to the local details of the foreground information is not significant. Local detail features help to identify the blurred boundaries in medical images more accurately. In order to make up for the defects of Transformer and capture more abundant local information, this paper proposes an attention and MLP hybrid-encoder architecture combining the Efficient Attention Module (EAM) with a Dual-channel Shift MLP module (DS-MLP), called HEA-Net. Specifically, we effectively connect the convolution block with Transformer through EAM to enhance the foreground and suppress the invalid background information in medical images. Meanwhile, DS-MLP further enhances the foreground information via channel and spatial shift operations. Extensive experiments on public datasets confirm the excellent performance of our proposed HEA-Net. In particular, on the GlaS and MoNuSeg datasets, the Dice reached 90.56% and 80.80%, respectively, and the IoU reached 83.62% and 68.26%, respectively.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Algoritmos , Processamento de Imagem Assistida por Computador/métodos
10.
Sci Rep ; 12(1): 16117, 2022 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-36167743

RESUMO

U-Net has become baseline standard in the medical image segmentation tasks, but it has limitations in explicitly modeling long-term dependencies. Transformer has the ability to capture long-term relevance through its internal self-attention. However, Transformer is committed to modeling the correlation of all elements, but its awareness of local foreground information is not significant. Since medical images are often presented as regional blocks, local information is equally important. In this paper, we propose the GPA-TUNet by considering local and global information synthetically. Specifically, we propose a new attention mechanism to highlight local foreground information, called group parallel axial attention (GPA). Furthermore, we effectively combine GPA with Transformer in encoder part of model. It can not only highlight the foreground information of samples, but also reduce the negative influence of background information on the segmentation results. Meanwhile, we introduced the sMLP block to improve the global modeling capability of network. Sparse connectivity and weight sharing are well achieved by applying it. Extensive experiments on public datasets confirm the excellent performance of our proposed GPA-TUNet. In particular, on Synapse and ACDC datasets, mean DSC(%) reached 80.37% and 90.37% respectively, mean HD95(mm) reached 20.55 and 1.23 respectively.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Agricultura , Processamento de Imagem Assistida por Computador/métodos
11.
Sci Rep ; 12(1): 11968, 2022 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-35831628

RESUMO

Presently, research on deep learning-based change detection (CD) methods has become a hot topic. In particular, feature pyramid networks (FPNs) are widely used in CD tasks to gradually fuse semantic features. However, existing FPN-based CD methods do not correctly detect the complete change region and cannot accurately locate the boundaries of the change region. To solve these problems, a new Multi-Scale Feature Progressive Fusion Network (MFPF-Net) is proposed, which consists of three innovative modules: Layer Feature Fusion Module (LFFM), Multi-Scale Feature Aggregation Module (MSFA), and Multi-Scale Feature Distribution Module (MSFD). Specifically, we first concatenate the features of each layer extracted from the bi-temporal images with their difference maps, and the resulting change maps fuse richer semantic information while effectively representing change regions. Then, the obtained change maps of each layer are directly aggregated, which improves the effective communication and full fusion of feature maps in CD while avoiding the interference of indirect information. Finally, the aggregated feature maps are layered again by pooling and convolution operations, and then a feature fusion strategy with a pyramid structure is used, with layers fused from low to high, to obtain richer contextual information, so that each layer of the layered feature maps has original semantic information and semantic features of other layers. We conducted comprehensive experiments on three publicly available benchmark datasets, CDD, LEVIR-CD, and WHU-CD to verify the effectiveness of the method, and the experimental results show that the method in this paper outperforms other comparative methods.

12.
Entropy (Basel) ; 24(7)2022 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-35885162

RESUMO

Violence detection aims to locate violent content in video frames. Improving the accuracy of violence detection is of great importance for security. However, the current methods do not make full use of the multi-modal vision and audio information, which affects the accuracy of violence detection. We found that the violence detection accuracy of different kinds of videos is related to the change of optical flow. With this in mind, we propose an optical flow-aware-based multi-modal fusion network (OAMFN) for violence detection. Specifically, we use three different fusion strategies to fully integrate multi-modal features. First, the main branch concatenates RGB features and audio features and the optical flow branch concatenates optical flow features with RGB features and audio features, respectively. Then, the cross-modal information fusion module integrates the features of different combinations and applies weights to them to capture cross-modal information in audio and video. After that, the channel attention module extracts valuable information by weighting the integration features. Furthermore, an optical flow-aware-based score fusion strategy is introduced to fuse features of different modalities from two branches. Compared with methods on the XD-Violence dataset, our multi-modal fusion network yields APs that are 83.09% and 1.4% higher than those of the state-of-the-art methods in offline detection, and 78.09% and 4.42% higher than those of the state-of-the-art methods in online detection.

13.
Sensors (Basel) ; 22(12)2022 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-35746271

RESUMO

Different feature learning strategies have enhanced performance in recent deep neural network-based salient object detection. Multi-scale strategy and residual learning strategies are two types of multi-scale learning strategies. However, there are still some problems, such as the inability to effectively utilize multi-scale feature information and the lack of fine object boundaries. We propose a feature refined network (FRNet) to overcome the problems mentioned, which includes a novel feature learning strategy that combines the multi-scale and residual learning strategies to generate the final saliency prediction. We introduce the spatial and channel 'squeeze and excitation' blocks (scSE) at the side outputs of the backbone. It allows the network to concentrate more on saliency regions at various scales. Then, we propose the adaptive feature fusion module (AFFM), which efficiently fuses multi-scale feature information in order to predict superior saliency maps. Finally, to supervise network learning of more information on object boundaries, we propose a hybrid loss that contains four fundamental losses and combines properties of diverse losses. Comprehensive experiments demonstrate the effectiveness of the FRNet on five datasets, with competitive results when compared to other relevant approaches.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Aprendizagem
14.
Sensors (Basel) ; 22(12)2022 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-35746407

RESUMO

Change detection (CD) is a particularly important task in the field of remote sensing image processing. It is of practical importance for people when making decisions about transitional situations on the Earth's surface. The existing CD methods focus on the design of feature extraction network, ignoring the strategy fusion and attention enhancement of the extracted features, which will lead to the problems of incomplete boundary of changed area and missing detection of small targets in the final output change map. To overcome the above problems, we proposed a hierarchical attention residual nested U-Net (HARNU-Net) for remote sensing image CD. First, the backbone network is composed of a Siamese network and nested U-Net. We remold the convolution block in nested U-Net and proposed ACON-Relu residual convolution block (A-R), which reduces the missed detection rate of the backbone network in small change areas. Second, this paper proposed the adjacent feature fusion module (AFFM). Based on the adjacency fusion strategy, the module effectively integrates the details and semantic information of multi-level features, so as to realize the feature complementarity and spatial mutual enhancement between adjacent features. Finally, the hierarchical attention residual module (HARM) is proposed, which locally filters and enhances the features in a more fine-grained space to output a much better change map. Adequate experiments on three challenging benchmark public datasets, CDD, LEVIR-CD and BCDD, show that our method outperforms several other state-of-the-art methods and performs excellent in F1, IOU and visual image quality.


Assuntos
Algoritmos , Redes Neurais de Computação , Atenção , Humanos , Processamento de Imagem Assistida por Computador/métodos , Tecnologia de Sensoriamento Remoto
15.
Comput Intell Neurosci ; 2022: 9637460, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35586112

RESUMO

To address the problem that some current algorithms suffer from the loss of some important features due to rough feature distillation and the loss of key information in some channels due to compressed channel attention in the network, we propose a progressive multistage distillation network that gradually refines the features in stages to obtain the maximum amount of key feature information in them. In addition, to maximize the network performance, we propose a weight-sharing information lossless attention block to enhance the channel characteristics through a weight-sharing auxiliary path and, at the same time, use convolution layers to model the interchannel dependencies without compression, effectively avoiding the previous problem of information loss in channel attention. Extensive experiments on several benchmark data sets show that the algorithm in this paper achieves a good balance between network performance, the number of parameters, and computational complexity and achieves highly competitive performance in both objective metrics and subjective vision, which indicates the advantages of this paper's algorithm for image reconstruction. It can be seen that this gradual feature distillation from coarse to fine is effective in improving network performance. Our code is available at the following link: https://github.com/Cai631/PMDN.


Assuntos
Compressão de Dados , Destilação , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação
16.
Sci Rep ; 12(1): 7082, 2022 04 30.
Artigo em Inglês | MEDLINE | ID: mdl-35490175

RESUMO

Deep hashing method is widely applied in the field of image retrieval because of its advantages of low storage consumption and fast retrieval speed. There is a defect of insufficiency feature extraction when existing deep hashing method uses the convolutional neural network (CNN) to extract images semantic features. Some studies propose to add channel-based or spatial-based attention modules. However, embedding these modules into the network can increase the complexity of model and lead to over fitting in the training process. In this study, a novel deep parameter-free attention hashing (DPFAH) is proposed to solve these problems, that designs a parameter-free attention (PFA) module in ResNet18 network. PFA is a lightweight module that defines an energy function to measure the importance of each neuron and infers 3-D attention weights for feature map in a layer. A fast closed-form solution for this energy function proves that the PFA module does not add any parameters to the network. Otherwise, this paper designs a novel hashing framework that includes the hash codes learning branch and the classification branch to explore more label information. The like-binary codes are constrained by a regulation term to reduce the quantization error in the continuous relaxation. Experiments on CIFAR-10, NUS-WIDE and Imagenet-100 show that DPFAH method achieves better performance.


Assuntos
Redes Neurais de Computação , Semântica , Atenção
17.
Sensors (Basel) ; 22(8)2022 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-35459043

RESUMO

Recently, the feedforward architecture of a super-resolution network based on deep learning was proposed to learn the representation of a low-resolution (LR) input and the non-linear mapping from these inputs to a high-resolution (HR) output, but this method cannot completely solve the interdependence between LR and HR images. In this paper, we retain the feedforward architecture and introduce residuals to a dual-level; therefore, we propose the dual-level recurrent residual network (DLRRN) to generate an HR image with rich details and satisfactory vision. Compared with feedforward networks that operate at a fixed spatial resolution, the dual-level recurrent residual block (DLRRB) in DLRRN utilizes both LR and HR space information. The circular signals in DLRRB enhance spatial details by the mutual guidance between two directions (LR to HR and HR to LR). Specifically, the LR information of the current layer is generated by the HR and LR information of the previous layer. Then, the HR information of the previous layer and LR information of the current layer jointly generate the HR information of the current layer, and so on. The proposed DLRRN has a strong ability for early reconstruction and can gradually restore the final high-resolution image. An extensive quantitative and qualitative evaluation of the benchmark dataset was carried out, and the experimental results proved that our network achieved good results in terms of network parameters, visual effects and objective performance metrics.

18.
Sensors (Basel) ; 22(1)2022 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-35009871

RESUMO

Recently, many super-resolution reconstruction (SR) feedforward networks based on deep learning have been proposed. These networks enable the reconstructed images to achieve convincing results. However, due to a large amount of computation and parameters, SR technology is greatly limited in devices with limited computing power. To trade-off the network performance and network parameters. In this paper, we propose the efficient image super-resolution network via Self-Calibrated Feature Fuse, named SCFFN, by constructing the self-calibrated feature fuse block (SCFFB). Specifically, to recover the high-frequency detail information of the image as much as possible, we propose SCFFB by self-transformation and self-fusion of features. In addition, to accelerate the network training while reducing the computational complexity of the network, we employ an attention mechanism to elaborate the reconstruction part of the network, called U-SCA. Compared with the existing transposed convolution, it can greatly reduce the computation burden of the network without reducing the reconstruction effect. We have conducted full quantitative and qualitative experiments on public datasets, and the experimental results show that the network achieves comparable performance to other networks, while we only need fewer parameters and computational resources.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética
19.
PLoS One ; 17(1): e0262689, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35073371

RESUMO

The accurate segmentation of retinal vessels images can not only be used to evaluate and monitor various ophthalmic diseases, but also timely reflect systemic diseases such as diabetes and blood diseases. Therefore, the study on segmentation of retinal vessels images is of great significance for the diagnosis of visually threatening diseases. In recent years, especially the convolutional neural networks (CNN) based on UNet and its variant have been widely used in various medical image tasks. However, although CNN has achieved excellent performance, it cannot learn global and long-distance semantic information interaction well due to the local computing characteristics of convolution operation, which limits the development of medical image segmentation tasks. Transformer, currently popular in computer vision, has global computing features, but due to the lack of low-level details, local feature information extraction is insufficient. In this paper, we propose Patches Convolution Attention based Transformer UNet (PCAT-UNet), which is a U-shaped network based on Transformer with a Convolution branch. We use skip connection to fuse the deep and shallow features of both sides. By taking advantage of the complementary advantages of both sides, we can effectively capture the global dependence relationship and the details of the underlying feature space, thus improving the current problems of insufficient extraction of retinal micro vessels feature information and low sensitivity caused by easily predicting of pixels as background. In addition, our method enables end-to-end training and rapid inference. Finally, three publicly available retinal vessels datasets (DRIVE, STARE and CHASE_DB1) were used to evaluate PCAT-UNet. The experimental results show that the proposed PCAT-UNET method achieves good retinal vessel segmentation performance on these three datasets, and is superior to other architectures in terms of AUC, Accuracy and Sensitivity performance indicators. AUC reached 0.9872, 0.9953 and 0.9925, Accuracy reached 0.9622, 0.9796 and 0.9812, Sensitivity reached 0.8576, 0.8703 and 0.8493, respectively. In addition, PCAT-UNET also achieved good results in two other F1-Score and Specificity indicators.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Vasos Retinianos/diagnóstico por imagem , Algoritmos , Humanos
20.
IEEE Trans Cybern ; 52(4): 2047-2058, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32721911

RESUMO

The Kullback-Leibler divergence (KLD), which is widely used to measure the similarity between two distributions, plays an important role in many applications. In this article, we address the KLD metric-learning task, which aims at learning the best KLD-type metric from the distributions of datasets. Concretely, first, we extend the conventional KLD by introducing a linear mapping and obtain the best KLD to well express the similarity of data distributions by optimizing such a linear mapping. It improves the expressivity of data distribution, which means it makes the distributions in the same class close and those in different classes far away. Then, the KLD metric learning is modeled by a minimization problem on the manifold of all positive-definite matrices. To deal with this optimization task, we develop an intrinsic steepest descent method, which preserves the manifold structure of the metric in the iteration. Finally, we apply the proposed method along with ten popular metric-learning approaches on the tasks of 3-D object classification and document classification. The experimental results illustrate that our proposed method outperforms all other methods.


Assuntos
Projetos de Pesquisa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...